Toxicological knowledge discovery by mining emerging patterns from toxicity data
نویسندگان
چکیده
Predicting the risk of toxic and environmental effects of chemical compounds is of great importance to all chemical industries [1]. Expert systems have shown success in predicting toxic risk by applying established knowledge of toxicology encoded as a knowledge base of structural alerts and a reasoning model. A disadvantage of expert systems is that developing new structural alerts requires considerable time and effort from domain experts. In order to expedite this process a software tool has been developed that can automatically mine representations of activating features directly from toxicity datasets and present them in an interpretable form. Our knowledge discovery tool applies emerging pattern (EP) mining [2]: a form of association rule mining [3] that is well known to computer science, but is relatively new to chemistry [4]. The EP mining algorithm accepts any data expressed as a series of binary properties, which is divided into two classes, and extracts patterns of those properties that are frequent within the data and are more frequent in one data class compared to the other. By mining emerging patterns from toxicity datasets, encoded as fingerprints of binary descriptors, the tool generates patterns of features that distinguish toxicants from innocuous compounds. These patterns represent potentially activating features of the toxic compounds that may then be used to define new alerts. The knowledge discovery tool has been tested using a public dataset of 3489 mutagens and 2981 non-mutagens, encoded as fingerprints of approximately 2000 functional groups and ring descriptors. EPs were produced and grouped into a number of hierarchical families. Six of the EPs that represented distinct chemical classes were selected for manual inspection by a toxicology expert. Relevant literature was analysed to find a mechanistic rationale for the mined features, which resulted in four new structural alerts for in vitro mutagenicity.
منابع مشابه
Emerging Pattern Mining To Aid Toxicological Knowledge Discovery
Knowledge-based systems for toxicity prediction are typically based on rules, known as structural alerts, that describe relationships between structural features and different toxic effects. The identification of structural features associated with toxicological activity can be a time-consuming process and often requires significant input from domain experts. Here, we describe an emerging patte...
متن کاملAutomating Knowledge Discovery for Toxicity Prediction Using Jumping Emerging Pattern Mining
The design of new alerts, that is, collections of structural features observed to result in toxicological activity, can be a slow process and may require significant input from toxicology and chemistry experts. A method has therefore been developed to help automate alert identification by mining descriptions of activating structural features directly from toxicity data sets. The method is based...
متن کاملA data mining approach to employee turnover prediction (case study: Arak automotive parts manufacturing)
Training and adaption of employees are time and money consuming. Employees’ turnover can be predicted by their organizational and personal historical data in order to reduce probable loss of organizations. Prediction methods are highly related to human resource management to obtain patterns by historical data. This article implements knowledge discovery steps on real data of a manufacturing pla...
متن کاملData Mining & Knowledge Discovery in Databases: An AI Perspective
Data mining and Knowledge discovery has several important application areas. Data mining and knowledge discovery have been topics considered at many AI, database and statistical conferences. Knowledge discovery generally refers to the process of identifying valid, novel and understandable patterns. Knowledge discovery from large databases, often called data mining, refers to the application of ...
متن کاملEfficient mining of interesting emerging patterns and their effective use in classification
Knowledge Discovery in Databases (KDD), or Data Mining is used to discover interesting or useful patterns and relationships in data, with an emphasis on large volume of observational databases. Among many other types of information (knowledge) that can be discovered in data, patterns that are expressed in terms of features are popular because they can be understood and used directly by people. ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره 5 شماره
صفحات -
تاریخ انتشار 2013